Search Results/Filters    

Filters

Year

Banks




Expert Group











Full-Text


Author(s): 

ATASHI Hadi

Issue Info: 
  • Year: 

    2023
  • Volume: 

    3
  • Issue: 

    1
  • Pages: 

    5-16
Measures: 
  • Citations: 

    0
  • Views: 

    58
  • Downloads: 

    10
Abstract: 

Microarray technology is a powerful technique to measure the expression levels of large numbers of genes simultaneously. Microarray data contains many noise sources; therefore, several preprocessing steps are necessary to convert the raw data to achieve accurate analyzing results. Preprocessing of microarray data includes background correction, data normalization, and summarization steps each can be performed by a large variety of methods. However, the relative impact of these methods on the detection of differentially expressed genes remains to be determined. The aim of this study was to compare the effects of different methods of preprocessing on the results of differentially expressed gene detection. The used data was downloaded from the NCBI GEO database. The series (GSE) accession number, platform (GPL) accession number, and platform name of the data were GSE56589, GPL18534, and Affymetrix Bovine Genome Array, respectively. Two background correction methods (MAS.5 and RMA.2), two normalization methods (Scaling normalization and Quantile normalization), and two summarization methods (Tukey biweight and Medianpolish) were evaluated. The results showed that the number and types of differentially expressed genes could be mainly affected by background correction and normalization methods, but the summarization method showed a small impact.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 58

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 10 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2025
  • Volume: 

    8
  • Issue: 

    4
  • Pages: 

    51-66
Measures: 
  • Citations: 

    0
  • Views: 

    3
  • Downloads: 

    0
Abstract: 

Accurate and early prediction of diabetes is crucial for initiating prompt treatment and minimizing the risk of long-term health issues. This study introduces a comprehensive machine learning model aimed at improving diabetes prediction by leveraging two clinical datasets: the PIMA Indians Diabetes dataset and the Early-Stage Diabetes dataset. The pipeline tackles common challenges in medical data, such as missing values, class imbalance, and feature relevance, through a series of advanced preprocessing steps, including class-specific imputation, engineered feature construction, and SMOTETomek resampling. To identify the most informative predictors, a hybrid feature selection strategy is employed, integrating recursive elimination, Random Forest-based importance, and gradient boosting. Model training uses Random Forest and Gradient Boosting classifiers, which are fine-tuned and combined through weighted ensemble averaging to boost predictive performance. The resulting model achieves 93.33% accuracy on the PIMA dataset and 98.44% accuracy on the Early-Stage dataset, outperforming previously reported approaches. To enhance transparency and clinical applicability, both local (LIME) and global (SHAP) explainability methods are applied, highlighting clinically relevant features. Furthermore, probability calibration is performed to ensure that predicted risk scores align with true outcome frequencies, increasing trust in the model’s use for clinical decision support. Overall, the proposed model offers a robust, interpretable, and clinically reliable solution for early-stage diabetes prediction.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 3

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Issue Info: 
  • Year: 

    2020
  • Volume: 

    195
  • Issue: 

    -
  • Pages: 

    0-0
Measures: 
  • Citations: 

    1
  • Views: 

    60
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 60

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Journal: 

Arman Process Journal

Issue Info: 
  • Year: 

    2024
  • Volume: 

    5
  • Issue: 

    4
  • Pages: 

    1-14
Measures: 
  • Citations: 

    0
  • Views: 

    79
  • Downloads: 

    0
Abstract: 

Social networks are primarily represented and analyzed in the form of graphs with a large number of vertices and edges, structured as an adjacency matrix. The edges indicate relationships between individuals and act as connections between the vertices. The structural characteristics of each network are determined by the features of the edges and vertices within it. In this research, conducted on various types of social network data from the Stanford University database, a preprocessing method was employed using a competitive colonial algorithm for feature selection with the highest merit (lowest cost). To evaluate the impact of feature selection on the final output, experiments were conducted both with and without feature selection operations using various algorithms commonly used in this field. Valid metrics such as accuracy, precision, sensitivity, and recall were independently measured on the output results with an average of 10 program executions. The comparison of results between scenarios with and without feature selection showed a significant impact on all metrics of the final outcome. Many features in the datasets were either unused or contained minimal information. Not removing these features not only increased the computational burden but also affected the accuracy of the output results due to time-consuming execution.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 79

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

BaniMustafa Ahmed

Issue Info: 
  • Year: 

    2019
  • Volume: 

    11
  • Issue: 

    3
  • Pages: 

    79-89
Measures: 
  • Citations: 

    0
  • Views: 

    209
  • Downloads: 

    155
Abstract: 

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes which is known to deteriorate the performance of classifiers. It also influences its validity and generalizablity. The classification models in this study were built using five machine learning algorithms known as PLS-DA, MLP, SVM, C4. 5 and ID3. This model is built after carrying out a number of intensive data preprocessing procedures to tackle the problem of imbalanced classes and improve the performance of the constructed classifiers. These procedures involves applying data transformation, normalization, standardization, re-sampling and data reduction procedures using a number of variables importance scorers. The best performance was achieved by building an MLP model that was trained and tested using five-fold cross-validation using datasets that were re-sampled using SMOTE method and then reduced using SVM variable importance scorer. This model was successful in classifying samples with excellent accuracy and also in identifying the potential disease biomarkers. The results confirm the validity of metabolomics data mining for diagnosis of cachexia. It also emphasizes the importance of data preprocessing procedures such as sampling and data reduction for improving data mining results, particularly when data suffers from the problem of imbalanced classes.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 209

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 155 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

CANNAS B.

Issue Info: 
  • Year: 

    2006
  • Volume: 

    31
  • Issue: 

    18
  • Pages: 

    1164-1171
Measures: 
  • Citations: 

    1
  • Views: 

    140
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 140

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Journal: 

Issue Info: 
  • Year: 

    2008
  • Volume: 

    42
  • Issue: 

    3 (113)
  • Pages: 

    327-338
Measures: 
  • Citations: 

    0
  • Views: 

    2332
  • Downloads: 

    0
Abstract: 

Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches to hyperspectral data classification is inaccurate class parameters estimation. It has been found that the conventional approaches can be retained if a preprocessing stage is established before feature extraction procedure in classification of hyperspectral data. For preprocessing stage it has been proposed two steps in this paper including dimensionality reduction and class separability improvement. Sequential Parametric Projection Pursuit was used for dimensionality reduction because of its special characteristics. Projection Pursuit algorithm performs the computation of class parameter estimation at a lower dimensional space, giving better parameter estimation. For class separability improvement a lowpass filter has been used after dimensionality reduction. This paper shows that for different number of features, classification accuracy is improved when the preprocessing stage is applied.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 2332

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Issue Info: 
  • Year: 

    2017
  • Volume: 

    2
  • Issue: 

    1
  • Pages: 

    103-109
Measures: 
  • Citations: 

    1
  • Views: 

    107
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 107

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

SOLGI A. | ZAREI H. | GOLABI M.R.

Issue Info: 
  • Year: 

    2017
  • Volume: 

    24
  • Issue: 

    2
  • Pages: 

    185-201
Measures: 
  • Citations: 

    0
  • Views: 

    1773
  • Downloads: 

    0
Abstract: 

Background and Objectives: An increasing need to water causes the importance of planning management in order to control water consumption in the future. River flow prediction, in addition to the management of water resources, can predict natural disasters such as flood and drought. Therefore, an accurate estimation of river flow using different models is an issue which has been considered by different water resource researchers. Intelligent models have been used to predict river flow. One of these models, which have shown appropriate performance, is Gene Expression Programming (GEP). A use of intelligent models in combinations has been lately accepted and for this purpose, the wavelet transform is usually used.Materials and Methods: In this study, the GEP model was used for modeling flow in the daily and monthly scale in Gamasiyab River. For this purpose, data of precipitation, temperature, evaporation and flow Gamasiyab River in Varayeneh Station was used during the period from 1970 to 2012. To increase the accuracy of the model, two methods of data pre-process, called Wavelet transform and principal components analysis (PCA) and were used in such a way that the primary signal of each input parameter was decomposed using the wavelet transform.Then, to determine main sub-signals, the principal components analysis was used and main sub-signals as inputs were entered into the GEP model to produce Wavelet-Gene Expression Programming (WGEP).Results: Detection of different structures of the GEP model showed that the performance of the model was good on the daily scale, but in the monthly scale, the performance was reduced. The comparison of the WGEP model with The GEP model showed that the performance of the hybrid model in both of the daily and monthly scale was better than the simple model. It’s because of a pre-process which was done on data. The results of the hybrid model, based on the coefficient determination, was increased by 4% on the daily scale and by 23% in the monthly scale. Also, regarding too many sub-signals, using the Principal Components Analysis increased the speed of running.Conclusion: Using pre-process of data has increased the performance of the model and using the PCA, as an auxiliary tool for the wavelet transform, increased the speed and accuracy of the model. Totally, the results showed that it’s possible to use the GEP model with the wavelet transform as a suitable tool for modeling and predicting the flow of Gamasiyab River.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 1773

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2020
  • Volume: 

    30
  • Issue: 

    191
  • Pages: 

    22-30
Measures: 
  • Citations: 

    0
  • Views: 

    346
  • Downloads: 

    0
Abstract: 

Background and purpose: Today, information systems and databases are widely used and in order to achieve higher accuracy and speed in making diagnosis, preventing the diseases, and choosing treatments they should be merged with traditional methods. This study aimed at presenting an accurate system for diagnosis of diabetes using data mining and a heuristic method combining neural network and particle swarm intelligence. Materials and methods: In this applied research, along with the training of the neural network, a particle swarm optimization algorithm was used to determine the weight of the optimal neural networks using RapidMiner Software on pima Indian Diabetes dataset for 768 patients. Results: The proposed algorithm was found to be in line with the real model. The highest accuracy, specificity, and sensitivity of the method, with 50 different tests, were 94. 1%, 92. 88%, and 92. 12%, respectively. Conclusion: In this study, average modeling error as a target function was minimized after a series of repetitions. By increase in initial population and number of replications, in addition to improving the accuracy of the proposed method, the sensitivity parameters and the positive predictive value ere improved. In fact, sensitivity and accuracy of the proposed method is better and higher than previous similar methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 346

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button